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FOREWORD 



Since 1977 The I.N. Thut World Education Center (TWEC) has issued 
World Education Monographs. They are usually the products of student or 
faculty research ^ or the texts of presentations that were made at confer- 
ences or collbquia. A complete list of the sixteen monographs that have 
been issued to date is available from TWEC. 

The 1982-1983 titles in the series were: 

AL-AZHAR: A UNIVERSITY BETWEEN TWO AGES . 
Mohamed Misbah, 1983. 13 pp. 



THE EGYPTIAN EXPERIENCE IK EDUCATION. 
Mohamed Misbah, 1983. 20 pp. 

I ; 

CULTURAL CONFLICTS IN CARRYING OUT A CHILD FEEDING PROJECT IN PARAGUAY 
THE ITACURUBI DEL R0SARI0 EXPERIENCE OF BETTY KEENEY , 1945-1947. 
(Double Issue) Kay Hill, 1983. 60 pp. 

Copies of these publications may be obtained by sending a check for 
$1.50 (except for the last title which costs $3.00) made out to The I.N. 
That World Education Center for each monograph ordered. Please add $1 to 
cover postage and handling for orders of less than three items, or $2 for 
more than three. All orders must be pre-paid or on official purchase 
forms. Discounts are not offered to book seiiars or purchasing agents, 
and TWEC cannot accept returns or make refunds. 

In addition to the present monograph, the 1983-1984 series is 
scheduled to include: 

CONTEMPORARY EDUCATION IN GHANA, WEST AFRICA. 

Ellen Segbefia 

i 

COMPARING, PUBLIC, AND PRIVATE SCHOOLS IN GHANA: THE PUZZLING ROLE 
OF SELECTIVITY, BIAS. 
Bernard Kodwo Hay ford 



THE INFLUENCE OF ECONOMIC AND OTHER FACTORS ON THE QUALITY OF THE 
TEACHER WORKFORCE: A CASE STUDY OF GHANA. 
Bernard Kodwo Hay ford 

The author of the present monograph, Dr. Richard H. Pfau, is a native 
of Baltimore, Maryland. He received his undergraduate degree from the 
University of Baltimore, and earned his doctorate in International Develop- 
ment Education at the University of Pittsburgh. Dr. Pfau worked for more 
than seven years in Nepal as a member of the Peace Corps, and under AID 
auspices. He is married, and the Pfaus have two children. They now make 
their home in Mansfield Center, Connecticut. 



ABSTRACT 



This monograph reviews studies which have measured and 
compared classroom and other human behaviors occurring 
in different cultures and nations, points out problems 
related to the comparisons made , and describes procedures 
which can be Used to help standardize measurements of 
behavior made using systematic observation instruments. 
Standardization is considered to be achieved when behaviors 
are classified the same way by different observers who 
use an instrument, and when measurements which result 
have scalar identity and are free of systematic observation 
errors. Procedures discussed include using observers 
from each culture studied, and preserving instrument 
descriptions, samples of behaviors studied * and associated 
standard measurements of those behaviors for reference 
by others. Areas needing additional research and thought 
are also highlighted. 
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Standardizing Behavioral Measurements 
Across Cultures, Nations, and Time 

Standardized behavioral measurements are lacking in 
the social sciences . Although institutes , associations , 
and hundreds of standards have been established to facili- 
tate the making of physical measurements for engineering, 
physical science, and business purposes, virtually no 
mechanisms or accepted standards exist to help social 
scientists make more comparable measurements of behavior. * 
As a result, studies of behavior often lack precision and 
validity, scholars are hampered in their ability to 
communicate with one another, and the development and 
testing of social science theory is hindered (Moles , 1977; 
Triandis, 1977* P- 10; Johnson, 1978; Nunnally, 1978, pp. 
6-10) . 

This monograph explains procedures which can be used to 
help standardize measurements of behavior made using 
systematic observation techniques * discusses problems 
encountered when such techniques are used to measure and 
compare naturally occurring behaviors across cultures and 
time , and indicates areas of inquiry which, if pursued, 
could provide information useful to comparative scholars 
wishing to standardize behavioral measurements . 
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Category Systems 

Behavior can be observed and measured in many ways. 
Relatively indirect ways include the Use of questionnaires, 
^ interviews, and diaries to obtain information from persons 

about their own behavior or the behavior of others. More 
direct ways include having trained persons directly observe 
behaviors of interest and record their observations by 
writing narrative descriptions, or by means of rating 
systems, checklists, or other observation instruments. 

One systematic technique which provides an especially 
promising basis for making precise and valid cross-cultural 
comparisons of behavior involves the use of "category 
systems" (Pfau, 1976, 1980). This is the technique upon 
which the present discussion is focused. 

Category systems are systematic observation instru- 
ments which are characterized by two major features: (a) 
clearly specified, well defined categories of behavior to 
be measured, and (b) objective means for recording the 
occurrence of those behaviors , such as counting methods or 
the use of timing devices (as indicated in Table i) . 
Observers using these Instruments make records of behaviors 
observed as those behaviors occur or within a few seconds 
afterwards . Alternatively, records may be made at a later 
time, by viewing films, videotapes or other preserved 
samples of behaviors to be measured . 
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TABLE i 

CATEGORY SYSTEM RECORDING METHODS 



rding method' 



2 method 

uency recording 

t recording 



Distinguishing characteristics 



Accuracy 



A record is made each time a behavior Potentially 
of interest occurs. ' high 



tion method ' 



A cumulative stopwatch or other timing 
device is started when the behavior of 
interest begins and is stopped when 
the behavior ends. Alternatively, 
the beginning and ending times of 
behaviors are recorded on paper or on 
a special recording instrument. 



Potentially 
high 



mtaneous time 
lethod 



■val method 

i interval time sampling 
al-interval time sampling 
ero sampling 



Records are. made of the behaviors Potentially 
occurring at exact instants of time . high 
These instants are often separated by 
fixed periods of time, such as 30 seconds 
or 5 minutes. 



The observation period is divided into Variable 
small intervals of time, lasting from 
3 seconds to 15 seconds or more. 
Recordings are made to indicate whether 
behaviors of interest were observed to 
occur during each time interval. 



rf- 
&3 

&3 

H* 
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TABLE 1 (Continued) 



wording method' 



Distinguishing characteristics 



Accuracy 



sckli st 



Same as the_''interval_method" but 
with a relatively larger time interval. 



Variable 
(Generally 
lower than 
other methods 
listed above) 



?cimen record 



libitum sampling 



A detailed narrative or shorthand 
description is made of behavior as ix 
is observed. Later, the occurrence of 
specific behaviors of interest are 
counted or otherwise classified. 



Apparently 
lower than 
most other 
methods 
described 
above 



a The first four. major classifications shown are based on Jackson, Della-Piana , 
I Sloane (1975) ■ Alternative names and slight variations of the major methods are 
jo indicated. See Jackson et al. (1975) and Altmann ( 197^) for details of the first 
ir methods described, 

^The Interval method should be used with caution, it is subject to differentially 
itorting measurements of behaviors observed in different cultures. 

°These are variations of category systems. 



cf 

h" 
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Often, observers who use these instruments are trained 
until their records agree highly with those made by experts 
in the use of an instrument. High agreement, when reached, 
* indicates that the process of measurement is unambiguous - t 

and that a "standard language" has been applied by the 
different observers to describe behaviors observed. 

The potential suitability of category systems for 
helping to make cross-cultural comparisons of behavior is a 
result of the explicit classification and objective recording 
procedures associated with use of these instruments . These 
procedures lend themselves to being used by observers of 
differing cultural backgrounds to make standardized 
measurements of behavior. Such standardized measurements, 
in turn, provide a basis for making precise and valid 
cross-cultural comparisons (see Pfau, 1980, for details). 



Standardization 
Before going further, let me explain what 1 mean by 
standardization of observation instrument usage. Standard- 
ization of measurements made using these instruments is 
: considered to be achieved when the following three condi- 

tions are met : (a) when behaviors observed are classified 
•» ._ ______ 

the same way by different persons using an instrument* s 

categories , ( b) when measurements of the behaviors 
classified are made using the same metric so that scalar 
identity is achieved across occasions in which the 

ERIC 
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instrument is used, arid ( c) when systematic measurement 
errors do riot cccur . 

The first condition requires that a "standard language" 
be shared and used by different observers to classify events 
they observe. The second condition means that when an 
instrument is used to make measurements of behaviors 
occurring in different locations, cultures, or at different 
times, the measurements obtained will represent quantitatively 
identical scales (Poortiriga, 1975) • That is, differences 
in the measurements made will represent actual differences 
in the extent to which behaviors observed occurred , while 
equal measurements will indicate equal magnitudes of 
behaviors observed (within limits imposed by random errors 
of measurement) . Such scalar identity signifies not' only 
that an instrument measures the same attributes in different 
culture?: but that the same quantitative scale is used in 
each culture to measure those attributes. The third 
condition requires that biases will not affect measurement? 
made , such that those measurements systematically differ 
from the hypothetical "true values" of the behaviors 
observed (Schumacher, 1981). This means, for example, that 
observers will riot make different measurements due to 
differing sensitivities to behavioral subtleties in one 
or more of the cultures studied (Lorigabaugh, 1980, p. 105; 
Moore, I969, p. 255; Schweizer , 1978, pp. 13^-135) . and 
that time unit distortions , which are sometimes associated 
with use of the interval recording method , will riot occur 
(Pfau, 1981) . 

13 
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Past Usage of Category Systems 

A number of researchers have used category systems to 

measure and compare naturally occurring behaviors in 

i 

different cultures arid nations. 

Investigations in which these instruments were used to 
compare behaviors occurring in different countries have 
included studies of parent and child behaviors in Japan and 
the U.S.A. (Caudlll & Weinstein, 1969; Caudill & Frost, 
197k) and in Yugoslavia and the U.S.A. (Lewis & Ban, 
1977) ; differences in infant separation protest in 
Guatemala and in the U.S.A. (Lester et al. , 197*0 ; and 
child-holding patterns in different societies (Riehaids & 
Finger, 1975). A number of researchers have also studied 
similarities and differences in the classroom behaviors of 
teachers and students in different countries. Tisher ( 197o) 
compared Australian, U.S. p and New Zealand teacher behaviors, 
while other studies compared U.S. teaching with that 
occurring in Great Britain (Birrell, 197*0 i in the Bahamas 
(Ray & Ray, 1976) , and in the Kingdom of Nepal (Pfau, 1977) • 
Category systems have also been used to study the social 
patterns of urban pedestrians in Middle Eastern and 
Western countries (Berkowitz, 1971) i sexual differences in 
methods of carrying books by students in several Central 
American and North American countries (Jerini* 1976)* arid 
nonverbal behaviors during conversations in Germany, Italy, 
and the U.S.A. (Shuter, 1977) • 
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Studies conducted within single countries of different 
cultural and subcultural groups, In which category systems 
were used* have included comparisons of proxemic behaviors 
during conversations of Arab and American students (Watson 
& Graves, 1966), of Anglo-, Black-, and Mexican-Americans 
observing animals at a U.S. zoo (Baxter, 1970), and during 
interactions of black, Puerto Rican, and white student 
dyads on school playgrounds around New York city (Aiello & 
Jones, 1971). In addition, interactions between members 
of nine different ethnic groups at the University of Guam 
have been studied (Brislin, 1971). as have nonverbal 
behaviors of Protestant Americans of Anglo-Saxon descent 
and of American Jews (Shuter, 1979) • the behaviors of 
mothers and children from different social classes and 
cultural groups in Israel (Greenbaum & Landau, 1977) and 
in the U.S.A. (Tulkin & Cohler, 1973; Tulkin, 1977; and 
Moss & Jones, 1977), and classroom behaviors in Amish and 
non-Amish schools in the U.S.A. (Payne, 1970) . 

Studies in which category systems have been used to 
study the behaviors of a single cultural group within a 
single country have been even more numerous. 

Variations of category systems have also been used in 
several significant cross-cultural studies. The most exten- 
sive and influential of these was the "Six Cultures Study" - f 
in which child-rearing and child behaviors in different 
cultures were described and compared (Whiting & Whiting, 
1975) . The approach used included having observers write 

o 
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extensive running accounts (called protocols) of behaviors 
which occurred arid the contexts of those behaviors, rewrit- 
ing the accounts in more and clearer detail as soon as 
possible, and then later coding the descriptions using a 
number of behavioral categories. Similar approaches, in 
which the running accounts have sometimes been called 
"specimen records", have included studies of behaviors 
occurring in an American and an English town (Barker & 
Barker, I963, 1978; Schoggen, Barker, & Barker, I963, 1978) , 
and of child behaviors in Japan and In the U.S.A. (Caudill 
& Schooler , 1973) ;** 

Another variation of category systems, the checklist, 
was used to study arid compare science teaching in Britain 
and in Canada (Hacker, Hawkes, & Heff ernan, 1979). 

A major accomplishment of these studies has been to 
demonstrate the range of theoretical and heuristic concerns f 
and the diversity of cultural and behavioral situations to 
which the systematic study of naturally occurring behavior 
is applicable . They have stimulated thinking about what can 
be done, and provided a basis upon which future work arid 
thinking can build. 

For the most part, however, these studies represent 
only an incomplete beginning to the standardization of 
behavioral measurements across cultures, nations , and time , 
for reasons to be now discussed . 
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Xj j.ni i ta. ti oils of Past Cateirorv Svsteiii Usance 
A researcher studying behaviors occurring in different 
cultures, or in the same culture at different times, may- 
wish to' compare existing data gathered by others who used 
a particular category system. Or, he or she may conduct or 
coordinate studies aimed at measuring behaviors in cultures 
of interest , and then compare the measurements obtained. 
A major consideration of such researchers should be the 
degree to which the measurements compared are standardized. 

Table 2 indicates some ways of trying to help ensure 
that standardization of category system and other observa- 
tion instrument usage is achieved across cultures. These 
ways range from what this writer and others consider to be 
a "rigorous approach" (Longabaugh, 1980, pp. 104 & 106; 
Brislin, I98O, pp. 408-^09? Campbell, 1970, pp. 70-71) , to 
much more questionable approaches for measuring and compar- 
ing behaviors. As can be seen by looking at Table 2, the 
most rigorous approach uses observers from each culture 
studied to help determine if standardized measurements are 
made in those cultures. Using observers with such diverse 
backgrounds increases the chance that differences in the 
way behaviors are classified using an instrument, systematic 
measurement errors that may occur , and differences in the 
scalar identity of measurements made in each culture will 
be detected. 

Using the approaches indicated in Table 2 to classify 
the cross-cultural studies of behavior mentioned before yields 
the results shown in Table 3. As can be seen, nearly all of 



Standardizing 
11 



TABLE 2 

EXAMPLES OF TECHNIQUES FOR HELPING STANDARDIZE 
MEASUREMENTS MADE USING CATEGORY SYSTEMS IN DIFFERENT CULTURES 



A . A Rigorous- Approach 3 

Observers from Culture A and Culture(s) B (C 9 D 9 . . • ) 
observe behaviors of Culture A using a category system and 
reach high agreement among. measurements made . These same 
observers , without prior discussion, observe behaviors of 
Culture( s) B (C f D f . . . ) using the category system and again 
reach high agreement among measurements made. Measurements 
of behaviors made by these observers using the category 
system are compared between the cultures . 



B. Semi-Rigorous Approaches 

Approach B. : Observers from Culture A observe behaviors 
of Culture A using a category system and reach high levels of 
agreement among measurements made . One or more of these 
observers goes to Cultures s) B (C f D f . . . ) and either makes 
measurements directly using the category system or trains 

persons f rom Culture (s) B (C ;D; . . _;_) to use the category 

system in that culture until their measurements agree highly 
with those made by the Culture A observer (s). Measurements 
made of Culture A, B VC L D f . . . ) behaviors using the category 
system are then compared. 

Approach B 2 S Similar to Approach except that 
observers from Sulture(s) B (G , D f . . . ) taxe the initiative ih 
learning to use a category system developed in Culture A. 
The Culture B (C,D f ...T observers reach high levels of 
agreement with Culture A observers when observing Culture A 
behaviors . Measurements made by Culture A observers ih 
Culture A are compared with measurements made by Culture B 
(C f D f ...) observers ih CultUre(s) B (C f D ,...). 

Approach EL: One or more observers use a category 
system at or near a single location in one country to measure 
the behaviors of two or more cultural groups. Formal checks 
indicate that high agreement or stability is achieved 
between measurements of the same events . Observer back- 
grounds are similar to some but not all of the cultures 
observed, or represent cultures different from those observed. 
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C . Approaches of More Questionable Rigor 

Approach C 1 : An observer or observers attempt to use 
a category system the same way as others have previously 
used it, by reading descriptive materials, or by learning to 
use it from a previous investigator. Although checks of 
agreement may be made among observers using the instrument 
in the new study or sub- study , formal checks are not made to 
determine if use of the instrument in the new study is 
similar to previous usage in other cultures . However, 
measurements made in the different cultures and studies are 
compared . 

Approach C ^ : Observers reach understandings among 

themselves abcut how a category system is to be used. These 
observers then use the instrument in Cultures A, B, (CD, . . . ) 
and compare measurements made in these cultures. Formal 
checks of agreement are not made , however , to determine if 
these observers agree highly among themselves when observing 
and describing the same events. 

Approach 6^; An observer uses the same category system 
to make observations in two or more cultures. Formal checks 
of agreement and stability of usage are not made , however. 



D . Highly Questionable Approach 

Measurements are compared of behaviors having the same 
general label but which were measured by different instru- 
ments used by different investigators . No formal checks of 
agreement are made to determine the equivalence of measure- 
ments of the same events resulting from use of the different 
instruments, although checks of agreement may be made between 
observers using a particular instrument in any one of the 
investigations which generated measurements. 



a ln this and other approaches described , standard- 
ization of usage also requires that systematic measurement 
errors such as "time unit distortion" be controlled and 
eliminated . 
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TABLE 3 

CLASSIFICATION OF CROSS-CULTURAL STUDIES OF BEHAVIOR 
WITH RESPECT TO THE RIGOR CF PROCEDURES USED 
TO STANDARDIZE MEASUREMENTS COMPARED 



A . Rigorous Approach 

(no studies were identified which used such an approach) 

B. Semi-Rigorous Approach 

Approach B^: Caudill and Weinstein (1969) 

Berkowitz (1971) 

Caudill and Frost (197*0 

Lester et al . (197*0 

Ray and Ray ( 1976) 

Pfau (1977) 

Shuter (1977) 
Approach B^: Hacker, Hawkes, and Heff ernan ( 1979) 
Approach B^ : Watson and Graves ( 1966 ) 

Baxter (1970) 

Payne (1970) 

Aiello and Jones (1971) 

Brislin (1971) 

Tulkin and Cohler (1973) 

Greehbaum and Landau ( 1977) 

Moss and Jones ( 1977) 

Tulkin (1977) 

Shuter (1979) 



2H 



Standardizing 



TABLE 3 (Continued) 



S . Approaches of More Questionable Rigor 

Tisher (1970) 

Barker and Barker (1963, 1978) 
Schoggen, Barker, and Barker (1963t 1978) 
Caudill and Schooler (I973) a 
Whiting and Whiting (1975) 
Birrell (197*0 
Richards and Finger (1975) b 
Jenni (1976) 
Lewis and Ban (1977) 
D . Highly Questionable Approach 

Konner (1977 • pp. 294-2955° 
Minge-Klevana (1980)° 



a Agreement checks in this study dealt with, only the 
second step of the specimen record procedure used (i.e. i 
agreement between codings of the same specimen records made 
by different observers) but did not deal with the first step 
(i.e., the degree to which specimen records of the same 
events made by different observers were similar) . 

^Although observer agreement checks were not reported 
in this study, the behaviors classified were so obvious that 
some persons may consider this study to represent a semi- 
rigorous approach. 

°These authors were aware of inadequacies in the data 
compared. 
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the studies reviewed have used "semi-rigorous" or "more 
questionable" approaches to help standardize the measure- 
ments compared . 

A major problem of the "semi-rigorous approaches is that 
observer drift may have occurred when the observation instru- 
ment was used to make measurements of different cultural groups 
(Kazdan, 1977; Longabaugh, 1980, pp. 107-109) . That is, those 
who conducted studies using these approaches assumed that an 
observer or observers who used an observation instrument in 
standard ways when observing members of one culture , transferred 
standardized usage of the instrument to other cultures when 
measurements were made. This is an untested assumption of 
these studies — and can be viewed as a limitation of them 
and of "semi-rigorous" approaches in general. 

The procedures used in the "more questionable" studies , 
besides not controlling for observer drift in usage , led to 
comparisons being made of data whose precision, as indicated 
by tests of observer agreement , is unknown. This means that 
the scalar identity of measurements made in these studies is 
open to even more question than those of the n semi-rigorous" 
approaches — and this is considered to be a serious limitation 
of these "more questionable" studies.^ 

However, perhaps an even greate r limitation of nearly all 
of the studies reviewed is that almost none have established a 
sufficient basis so that future researchers who may wish to 
gather and compare data with thesa past studies can ensure that 
their use of the "same" instrument is indeed the same. That 
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is> almost none of the investigators who conducted these studies 
has provided or otherwise retained enough information to permit 

standardization of observation instrument usage to tie achieved 

» - - - - - ... 

between these past studies arid future studies which these 

researchers or others may wish to conduct. This means that, in 

most cases, the scalar identity of measurements made during these 

past studies and during future studies cannot be estimated, nor 

can many systematic measurement errors which may have occurred 

in these studies be detected. As a result, comparisons of 

measurements made in the future with those made in most of these 

past studies will be hazardous . 

A Suggested Approach 

Techniques for helping overcome the problems of standard- 
izing measurements made using category systems have already 
been indicated in Table 2. That is f one of the more rigorous 
approaches described in that table can be used to help ensure 
that observers are making standardized measurements of different 
cultures at approximately 4;he same time . However, tht procedures 
indicated do not help to ensure that observers in the future will 
use an observation instrument as it was used in the past. This 
is so because even expert observers may modify their use of an 
. instrument over time. They may also die. Observer drift in 

usage over time which may result needs to be controlled if 
standardized measurements are to be made by observers at different 
times , either in the same or in different studies 

A way of overcoming this time-related problem as well 
as the cross-cultural problems discussed before is the following: 

1 . Preserve Samples of Behavior Observed 

© OQ 
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Instrument developers arid users would; according to 
this proposed approach * preserve samples of behaviors 
measured on movie film, audio-video tape recordings - s audio 
recordings i or photographs — the exact media used being 
dependent upon the types of behavior measured. Relevant 
contextual information which is hot evident from the 
preserved behavioral recordings should also be described in 
sufficient detail so that future observers vail have enough 
information to accurately code the behaviors preserved when 
they use the instrument. 

2. Make Standard Codings of These Preserved Beha v iors 
The preserved samples of behavior should then be cuat?d 

by an "expert" observer or by typical observers who partic- 
ipated in the study whose instrument usage is being 
preserved for future reference. The codings made will 
constitute a set of preestablished standards against whicli 
future measurements can be compared . 

3. Prepare Instrument Descriptions 
Sufficient information about the instrument used ♦ 

including other sets of preserved behavioi samples and 
associated "standard codings" t should be prepared so that 
future users can train themselves and others to use the 
; instrument the same way it was used in the past ("Where do 

. 1965; Thiagarajan, 1973) • These materials should 

_____ ___ _ __ Q 

be made available for future use others . 

Future Users Test Their Usage 

The materials prepared would then be used to train 
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hew observers to make standardized measurements using the 
instrument and to help detect arid correct observer drift 
from standardized usage which may occur in a study. For 
example i after training , observers would code previously 
unseen samples of the preserved behavioral records and their 
measurements would be compared with the preestablished 
standard measurements of those records. If high agreement 
is reached, this will indicate that the new observers are 
making measurements in a standardized manner. After high 
agreement is reached, additional checks using the preserved 
materials can be made from time to time to help detect arid 
correct observer drift from standardized usage which may 
occur (see Roebuck, Aspy , Sadler, & Willson, 197** t to see 
how this has been done in the past) . 

5 . Determine Standardization Across Cultures or Time 
The techniques outlined in Table 2 could then be used 
to help standardize measurements made across cultures and 
longer periods of time . 

For example , the procedures described above could be 
used with the Rigorous Approach described in fable 2 by 
having observers from Cultures A and B (G, 5, . . .) receive 
training until their measurements of the preserved behav- 
ioral recordings agree highly with the preestablished 
standard codings originally prepared . These observers 
would then jointly observe behaviors in Cultures A* B (C , D, 
. . . ) and determine if measurements of the same behavioral 
events they make in those cultures also agree highly. If 
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agreement is reached* then standardization of measurements 
made in those cultures is indicated* and comparisons of 
those measurements can be made. 

Similarly* measurements mp-ie at one time (t^J can be 
compared with measurements made using the same observation 
instrument in another culture or in the "same" culture at 
a future time ( t 2 ) . This could be done by having observers 
receive training using preserved behavioral records and 
training materials prepared when the instrument was used at 
the earlier time ( t^ . After observers reach high agreement 
with the preestablished standards associated with the 
training materials, measurements could be made at the future 
time (tg~) and compared with those made at the earlier time 
(t^)i This technique, if followed* represents -Approach B 2 
of Table 2, where Culture A is the behavioral situation 
measured at time tj arid Culture B is either a quite different 
culture at -time t g or a variation of Culture A which has 
evolved over time. 

Some Needed Rese arch and Thinking 

Although the suggestions made in the previous sections 
provide a framework for discussion and action, additional 
information and thinking are needed if standardization is 
to be achieved with confidence. For example, information 
helpful to persons wishing to standardize behavioral 
measurements across cultures arid time would be provided by 
research which answered the following questions. 

i. Can an observer who uses an observation instrument 
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in a standard way in one culture, transfer standardized 
usage of the instrument to another culture (or language) 
when he or she observes behaviors in the second culture or 
trains others to do so? An assumption of the widely used 
"Semi-Rigorous Approaches" described in Table 2 is that* 
yes, such transferability of standardized usage can and does 
occur. As indicated before, this assumption is yet to be 
tested. 

2. What techniques should investigators use to achieve 
standardization of observation instrument usage across 
cultures? The "Rigorous Approach" is one possible procedure. 
Is this approach sufficient? Is it too rigorous? Are other 
approaches more practical and satisfactory? 

3. What should be done if observers differ in the 
measurements they make in different cultures? Caudill, for 
instance, found that measurements made by observers in 
Japan and in the U.S.A. differed somewhat from his own 
(his being the standard against which theirs were judged). 
In order to make the measurements of these observers more 
equivalent, Caudill used a "weighing" procedure to adjust 
their scores. Are such weighing procedures a promising 
approach to use when differences in instrument usage are 
found to occur across cultures? (See Caudill and Weinstein, 
I969, pp. 2^-25* and Caudill and Frost, 1973 t p= 7$ for 
details) . 

4; Does the two-step procedure involved in measuring 
behavior bv first writing specimen records or protocols and 

O 
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then coding these written descriptions t result in measure- 
merits which are accurate enough to compare across cultures? 
There are some indications that such may not be the case 
(Spain arid Hollenbeck, 1975? Levine f 1977). The extent to 
which this procedure can be standardized across cultures 
needs to be studied more, given the fairly widespread use 
of such descriptions for comparative purposes. 

In addition, several other areas related to the use of 
category systems in comparative studies are in need of 
thought and investigation. These include the questions 
of (a) how category systems should be developed or modified 
to best ensure that they are suitable for the comparative 
purposes for which they are to be used, and (b) what addi- 
tional techniques should be used arid what additional 
information should be gathered so that the measurements 
provided by category system usage can be validly iriterpreted 
beyond the specific events quantified (sirice category 
systems , by themselves, do not provide much of a basis for 
understanding and explaining the events measured) . Although 
some information exists concerning these areas (ex., Goodenou 
1970, chap. 4; Pfau, 1981, pp. 31-34), more is needed. 

It is hoped that efforts will soon be made to answer 
these questions and others which this monograph will surely 
raise . It is also hoped that the suggestions made in this 
essay will provide a useful guide for action until standards 
for measuring behavior using direct observation techniques 
are more formally established by a consensus of concerned 

sr^hhl ara (S + anrtarHi rati hri 'Rasiric!, 1 Q77a = 1 09 r 7'hi . 
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Footnotes 

I would like to thank Richard W. Brislin and Carmi 
Schooler for commenting upon an earlier draft of this monograph 
which is based upon a paper presented at the annual meeting 
of the Society for Cross-Cultural Research, Syracuse, New 
York, February 1981. 

^A visit during July 1981 to the National Bureau of 
Standards reference collections at Gathersburg, Maryland, 
inquiries to the American National Standards institute 
(ANSI), to the American Society for Testing and Materials 
( ASTM) , and to the international Organization for Standard- 
ization (ISO) , and a review of related literature yielded 
no standards for measuring the occurrence of huran or 
animal behavior except for some concerning psychological testing. 

Scalar identity is considered necessary if the scores 
of culturally different groups are to be compared, according 
to Davidson (1977 t p. 50). 

^The author welcomes information from readers about 
other cross-cultural studies which used category systems to 
measure naturally occurring behaviors in hon- experimental 
settings . 

^Although the two-step (i.e., making a written 
description which is then coded) specimen record and 
protocol approaches provide a great deal of rich contextual 
information about behaviors observed ~ 9 they seem to result in 
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directly coding behavior observed (Spain and Hollenbeck, 
1975; Levine , 1977) • Checklists also seem to yield measure- 
ments which are less precise than those provided by other 
kinds of category systems, are more prone to distorting 
estimates of the extent to which behaviors occur, and do 
not lend themselves as well to the study of sequences of 
events (Bunkin and Bxddle, 197^ i P- 71). As a result, these 
variations do hot seem to lend themselves as well to 
standardizing measurements across cultures, and are differ- 
entiated from other kinds of category systems in this 
article for that reason. 

^This does not mean that scalar identity may not have 
been approximated in some cases , nor that a great deal of 
thought provoking and useful data was not gathered by many 
of these studies • However, the degree to which measurements 
made in these studies were standardized is open to question. 

^Such observer drift is an example of what Campbell 
and Stanley (1966) call "instrument decay". 

^For example, if the locations of persons in a room are 
being studied, photographs may be sufficient. If ah analysi 
of verbal behavior is being conducted, audio tape recordings 
may suffice . It should be noted though that specimen 
records , transcripts* or other kinds of narrative descrip- 
tions are not considered to be suitable for the preservation 
of realia as required by this step. 

See Herbert and Attridge, 1975 1 for guidelines about 
what to include in such training materials. 
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